11 research outputs found

    Computational methods for functional interpretation of diverse omics data

    No full text
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 199-218).Recent technological advances have resulted in an explosive growth of various types of "omics" data, including genomic, transcriptomic, proteomic, and metagenomic data. Functional interpretation of these data is key to elucidating the potential role of different molecular levels (e.g., genome, transcriptome, proteome, metagenome) in human health and disease. However, the massive size and heterogeneity of raw data pose substantial computational and statistical challenges in integrating and interpreting these data. To overcome these challenges, we need sophisticated approaches and scalable analytical frameworks. This thesis outlines two research efforts along these lines. First, we develop a novel three-tiered integrative omics framework for integrating and functionally analyzing heterogeneous omics datasets across a group of co-occurring diseases. We demonstrate the effectiveness of this framework in investigating the shared pathophysiology of autism spectrum disorder (ASD) and its multi-organ-system co-morbid diseases (e.g., inflammatory bowel disease, asthma, muscular dystrophy, cerebral palsy) and uncover a novel innate immunity connection between them. Second, we develop a new end-to-end computational tool, Carnelian, for robust, alignment-free functional profiling of whole metagenome sequencing reads, that is uniquely suited to finding hidden functional trends across diverse data sets in comparative analysis. Carnelian can find shared metabolic pathways, concordant functional dysbioses, and distinguish microbial metabolic function missed by state-of- the-art functional annotation tools. We demonstrate Carnelian's effectiveness on large-scale metagenomic studies of type-2 diabetes, Crohn's disease, Parkinson's disease, and industrialized versus non-industrialized cohorts.by Sumaiya Nazeen.Ph. D.Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    Integrative analysis of heterogeneous genomic datasets to discover genetic etiology of autism spectrum disorders

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 99-109).Understanding the genetic background of complex diseases is crucial to medical research, with implications to diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at predicting and prioritizing genes for a particular disease of interest. State-of-the-art gene prediction and prioritization methods rely on the observation that disease-causing genes have some sort of functional similarity based on either sequence, phenotype, protein-protein interaction (PPI) network, or functional annotation. Another increasingly accepted view is that human diseases result from perturbations of molecular networks, and genes causing the same or similar diseases tend to be close to one another in molecular networks. Such observations have built the basis for a large collection of computational approaches to find previously unknown genes associated with certain diseases. The majority of the methods are designed based on protein interactome networks, with integration of other large-scale omics data, to infer how likely it is that a gene is associated with a disease. In this thesis, we set out to address this outstanding challenge of understanding the genetic etiology of autism spectrum disorder (ASD), which refers to a group of complex neurodevelopmental disorders sharing the common feature of dysfunctional reciprocal social interaction. We introduce three novel methods for computing how likely a given gene is to be involved in ASDs based on copy number variations (CNVs), phenotype similarity, and protein interactome network topology. We also customize a random walk with restarts algorithm for ASD gene prioritization for the first time. Finally, we provide a novel integrative approach for combining CNV, phenotype similarity, and topology-related information with existing knowledge from literature. Our integrative approach outperforms the individual schemes in identifying and ranking ASD related genes. Our candidate gene set provides a number of interesting biological insights in that it is overrepresented in a number of interesting signaling, cell-adhesion and neurological pathways, molecular functions, and biological processes that are worth further investigation in connection with ASDs. We also find evidence for an interesting connection between gastrointestinal disorders, particularly inflammatory bowel diseases (IBD), and ASDs. The subnetworks we identify indicate the possibility of existence of subclasses of disorders along the autism spectrum.by Sumaiya Nazeen.S.M

    Carnelian uncovers hidden functional patterns across diverse study populations from whole metagenome sequencing reads

    No full text
    Abstract Microbial populations exhibit functional changes in response to different ambient environments. Although whole metagenome sequencing promises enough raw data to study those changes, existing tools are limited in their ability to directly compare microbial metabolic function across samples and studies. We introduce Carnelian, an end-to-end pipeline for metabolic functional profiling uniquely suited to finding functional trends across diverse datasets. Carnelian is able to find shared metabolic pathways, concordant functional dysbioses, and distinguish Enzyme Commission (EC) terms missed by existing methodologies. We demonstrate Carnelian’s effectiveness on type 2 diabetes, Crohn’s disease, Parkinson’s disease, and industrialized and non-industrialized gut microbiome cohorts

    Additional file 4 of Integrative analysis of genetic data sets reveals a shared innate immune component in autism spectrum disorder and its co-morbidities

    No full text
    Pathway enrichment analysis. This Excel file contains hypergeometric test p values per pathway per disease for KEGG, BioCarta, Reactome, and PID pathway collections as well as all canonical pathway gene sets collected from MSigDB version 4.0., and Fisher’s combined p values for ASD and its co-morbidities. (XLS 1444 kb

    Capability of 19-L polycarbonate plastic water cooler containers for efficient solar water disinfection (SODIS): Field case studies in India, Bahrain and Spain

    Get PDF
    The small treated volume (typically ~2 L) associated with polyethylene terephthalate (PET) bottles that are most frequently used in solar water disinfection (SODIS), is a major obstacle to uptake of this water treatment technology in resource-poor environments. In order to address this problem we have conducted a series of experiments in Spain, Bahrain and India, to assess the efficacy of large volume (19 L) transparent plastic (polycarbonate) water cooler/dispenser containers (WDCs) as SODIS reactors to inactivate Escherichia coli and Enterococcus faecalis, under strong natural sunlight. Reduction values of 6 log10 units (LRV = 6.0) have been observed using WDCs in each location. Additional comparisons between 2-L PET bottles and 19-L indicate that WDCs provide bacterial inactivation similar in both systems. SODIS disinfection experiments in turbid water (100 NTU) in both reactors showed very good inactivation efficiency. LRVs of 6 were obtained for E. coli in both WDC and 2-L PET bottles, and in the case of E. faecalis LRV = 5 and 6 were observed in Spain and Bahrain, respectively. These studies demonstrate that under conditions of strong sunlight and mild temperature, 19 L water dispenser containers can be used to provide adequate volumes of SODIS treated water for households or larger community applications such as schools or clinics in the developing world

    The missing link between genetic association and regulatory function

    No full text
    The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene–trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this ‘missing regulation.

    The Parkinson’s disease protein alpha-synuclein is a modulator of processing bodies and mRNA stability

    No full text
    Alpha-synuclein (αS) is a conformationally plastic protein that reversibly binds to cellular membranes. It aggregates and is genetically linked to Parkinson's disease (PD). Here, we show that αS directly modulates processing bodies (P-bodies), membraneless organelles that function in mRNA turnover and storage. The N terminus of αS, but not other synucleins, dictates mutually exclusive binding either to cellular membranes or to P-bodies in the cytosol. αS associates with multiple decapping proteins in close proximity on the Edc4 scaffold. As αS pathologically accumulates, aberrant interaction with Edc4 occurs at the expense of physiologic decapping-module interactions. mRNA decay kinetics within PD-relevant pathways are correspondingly disrupted in PD patient neurons and brain. Genetic modulation of P-body components alters αS toxicity, and human genetic analysis lends support to the disease-relevance of these interactions. Beyond revealing an unexpected aspect of αS function and pathology, our data highlight the versatility of conformationally plastic proteins with high intrinsic disorder
    corecore